Comparative non-targeted metabolomics comes of age through an increasing number of commercial vendors offering reproducible high-quality metabolomic data for translational researchers outside the mass spectrometry field. The MetaboDiff package aims to provide a low-level entry to differential metabolomic analysis by starting off with the table of relative metabolite quantifications provided by commercial vendors or core facilities.
MetaboDiff 0.1.0
The MetaboDiff R package requires R version \(\geq\) 3.4.
CRAN occasionally fails to compile the WGCNA package for Mac OS X. Hence, it is recommended to install the package before installing MetaboDiff:
install.packages("WGCNA")
If asked, install the package from source. Alternatively you might need to download the package from the developer and install the package locally:
install.packages(path_to_file, repos = NULL, type="source")
If you encounter problems installing WGCNA, please refer to the developer page. Please note that MetaboDiff can only be installed if WGCNA was successfully installed.
MetaboDiff can be installed via Github
library("devtools")
install_github("andreasmock/MetaboDiff")
and once installed loaded by
library("MetaboDiff")
The example data is derived from a study by Priolo and colleagues in which they used the service of Metabolon® to compare the tissue metabolome of 40 prostate cancers with 16 normal prostate specimens1 Priolo, C., Pyne, S., Rose, J., Regan, E. R., Zadra, G., Photopoulos, C., et al. (2014). AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer. Cancer Research, 74(24), 7198–7204. http://doi.org/10.1158/0008-5472.CAN-14-1490.
# load example data
data = as.matrix(read.csv(system.file("extdata",
"Priolo_data.csv",
package = "MetaboDiff")))
The metabolomic data within MetaboDiff are stored as a MultiAssayExperiment class2 Sig M (2017). MultiAssayExperiment: Software for the integration of multi-omics experiments in Bioconductor. R package version 1.2.1. This framework enables the coordinated representation of multiple experiments on partially overlapping samples with associated metadata and integrated subsetting across experiments. In the context of metabolomic data analysis, multiple assays are needed to store raw data and imputed data which usually contain different number of metabolites due to missing values (see section on data imputation for more details).
The core components of the MultiAssayExperiment class are:
ExperimentList - a slot of class ExperimentList containing data for each experimental assay. Within the ExperimentList slot, the metabolomic data are stored asassay - a matrix containing the relative metabolic measurements.rowData - a dataframe containing the metabolite annotation.colData - a slot of class dataframe describing the sample metadata available across all experiments.sampleMap - a slot of class dataframe relating clinical data to experimental assay.metadata - a slot of class list. Within MetaboDiff, this slot contains a list of dataframes summarizing the results from the comparative data analysis.Please refer to the MultiAssayExperiment vignette for more information.
assay = apply(data[4:nrow(data),8:ncol(data)],2,as.numeric)
colnames(assay) = paste0(rep("sample",ncol(assay)),1:ncol(assay))
rownames(assay) = paste0(rep("met",nrow(assay)),1:nrow(assay))
knitr::kable(assay[1:4,1:5])
| sample1 | sample2 | sample3 | sample4 | sample5 | |
|---|---|---|---|---|---|
| met1 | 33964.73 | 117318.43 | 118856.90 | 78670.7 | 102565.94 |
| met2 | 18505.56 | 167585.32 | 59621.97 | 66220.4 | 74892.27 |
| met3 | NA | 42373.93 | 27141.21 | NA | 38390.78 |
| met4 | 61638.77 | 74595.78 | NA | NA | NA |
colData = data.frame(id = colnames(data[,8:ncol(data)]),
tumor_normal = as.vector(t(data[1,8:ncol(data)])),
row.names=paste0(rep("pat",ncol(assay)),1:ncol(assay)))
To showcase the full functionality of MetaboDiff, a second random group label “random_gender” is generated.
#set seed for reproducibility
set.seed(1)
colData$random_gender = sample(c("random_male","random_female"),size = nrow(colData),replace = TRUE)
knitr::kable(head(colData))
| id | tumor_normal | random_gender | |
|---|---|---|---|
| pat1 | cp2 | N | random_male |
| pat2 | cp7 | N | random_male |
| pat3 | cp19 | N | random_female |
| pat4 | cp26 | N | random_female |
| pat5 | cp29 | N | random_male |
| pat6 | cp32 | N | random_female |
The example data set comprises metabolite annotation provided by the commercial vendor:
rowData = as.data.frame(data[4:nrow(data),1:7])
colnames(rowData) = data[3,1:7]
colnames(rowData)[7] = "HMDB_ID"
rownames(rowData) = paste0(rep("met",nrow(assay)),1:nrow(assay))
colnames(rowData)
## [1] "BIOCHEMICAL" "SUPER_PATHWAY" "SUB_PATHWAY" "METABOLON_ID"
## [5] "PLATFORM" "KEGG_ID" "HMDB_ID"
Alongside the metabolic measurements, Metabolon® provides metabolite annotation including so called super-pathways and sub-pathways. However, these annotations might very from vendor to vendor. Hence, the MetaboDiff package makes use of the Small Molecular Pathway Database (SMPDB) for metabolite annotation3 Jewison T, Su Y, Disfany FM, et al. SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database Nucleic Acids Res. 2014 Jan;42(Database issue):D478-84.. MetaboDiff supports all common metabolic ids as input for the annotation (HMDB, KEGG and ChEBI). In the example data, both the KEGG and the HMDB identifier are available. As the databases (HMDB, KEGG or ChEBI) cover unique annotations due to different standards in identifying and reporting metabolites4 Thiele, I., Swainston, N., Fleming, R. M. T., Hoppe, A., Sahoo, S., Aurich, M. K., et al. (2013). A community-driven global reconstruction of human metabolism. Nature Biotechnology, 31(5), 419–425. http://doi.org/10.1038/nbt.2488, the function get_SMPDBanno queries the database using all three ids (if available) and joins all available information. The current SMPDB build used within MetaboDiff is version 2.0 and will be updated as new versions are released. Please refer to the development branch of MetaboDiff to work with the latest SMPDB annotation.
rowData = get_SMPDBanno(rowData,
column_kegg_id=6,
column_hmdb_id=7,
column_chebi_id=NA)
se = SummarizedExperiment(assays=assay,
rowData=rowData)
experiment_list = list(raw=se)
experiment_list
## $raw
## class: SummarizedExperiment
## dim: 307 86
## metadata(0):
## assays(1): ''
## rownames(307): met1 met2 ... met306 met307
## rowData names(22): BIOCHEMICAL SUPER_PATHWAY ... SMPDB|InChI
## SMPDB|InChI.Key
## colnames(86): sample1 sample2 ... sample85 sample86
## colData names(0):
sampleMap = data.frame(primary=rownames(colData),
colname=colnames(se))
sampleMap_list = listToMap(list(raw=sampleMap))
met = MultiAssayExperiment(experiments = experiment_list,
colData = colData,
sampleMap = sampleMap_list)
met
## A MultiAssayExperiment object of 1 listed
## experiment with a user-defined name and respective class.
## Containing an ExperimentList class object of length 1:
## [1] raw: SummarizedExperiment with 307 rows and 86 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
In contrast to other high-throughput technologies, missing values are common in quantitative metabolomic datasets.
The function na_heatmap visualizes the missing metabolite measurements across the samples. The name of the column in colData for grouping and the label colors for the two groups need to be specified.
na_heatmap(met,
group_factor="tumor_normal",
label_colors=c("darkseagreen","dodgerblue"))
Figure 1: Missing metabolic measurements across the example data set
Missing measurements are visualized by a binary heatmap and barplots summarizing the fraction of missing measurement per metabolite and sample, respectively. In addition, the group label of interest (tumor (T) vs. normal (N)) is visualized. By default, metabolite measurements are only imputed if less than 40 percent are missing.
The example data supports the need for data imputation (figure 1). It could be shown that k-nearest neighbor imputation minimizes the effects on the normality and variance of the data as long as the number of missing data does not exceed 40%5 Armitage, E. G., Godzien, J., Alonso-Herranz, V., L pez-Gonz lvez, N., & Barbas, C. (2015). Missing value imputation strategies for metabolomics data. Electrophoresis, 36(24), 3050–3060. http://doi.org/10.1002/elps.201500352.
The function knn_impute adds the slot “impute” to the MultiAssayExperiment object that contains the imputed relative metabolite measurements for all metabolites with raw measurements in more than 60% of cases. We recommend a cutoff of 40% (i.e. 0.4). However the cutoff might be changed according to the discretion of the user.
met = knn_impute(met,cutoff=0.4)
met
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] raw: SummarizedExperiment with 307 rows and 86 columns
## [2] imputed: SummarizedExperiment with 238 rows and 86 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
As apparent form the summary description of the object met the imputed data matrix contains only 238 of the 307 original metabolites according the cutoff of 40% missing values.
Before we normalize the data, we want to exclude outliers in the study set. To this end, the function outlier_heatmap is provided. The sample annotation shows the number of missing metabolites per sample as a proxy of the impact of imputation on clustering. To showcase outliers, the hierarchical clustering tree is cut in 2 clusters (figure 2).
outlier_heatmap(met,
group_factor="tumor_normal",
label_colors=c("darkseagreen","dodgerblue"))
Figure 2: Hierarchical clustering of metabolite measurements
The tree is cut in 2 clusters. The user has the choice to exclude one cluster which he thinks might represent outliers. To determine the effect of imputation, a barplot displaying the fraction of missing metabolites is shown in the column annotation of the heatmap.
The imputed data of the example study set displays a cluster of 5 samples (cluster 1) with in average lower relative metabolite measurements. Due to the lack of batch information, this cannot be investigated further at this time. To demonstrate how a cluster can be removed, we apply the function remove_cluster to remove cluster 1:
met = remove_cluster(met,cluster=1)
met
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] raw: SummarizedExperiment with 307 rows and 81 columns
## [2] imputed: SummarizedExperiment with 238 rows and 81 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
As displayed in the summary of the met object, the 5 samples of cluster 1 were successfully removed from the slots “raw” and “imputed”.
Variance stabilizing normalization (vsn) is used to ensure that the variance remains nearly constant over the measured spectrum6 Huber, W., Heydebreck, von, A., Sültmann, H., Poustka, A., & Vingron, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics, 18 Suppl 1, S96–104..
met = normalize_met(met)
met
## A MultiAssayExperiment object of 4 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 4:
## [1] raw: SummarizedExperiment with 307 rows and 81 columns
## [2] imputed: SummarizedExperiment with 238 rows and 81 columns
## [3] norm: SummarizedExperiment with 307 rows and 81 columns
## [4] norm_imputed: SummarizedExperiment with 238 rows and 81 columns
## Features:
## experiments() - obtain the ExperimentList instance
## colData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract colData columns, subset, or experiment
## *Format() - convert ExperimentList into a long or wide DataFrame
## assays() - convert ExperimentList to a SimpleList of matrices
At this point the data processing is completed with the MultiAssayExperiment object containing 4 slots:
quality_plot(met,
group_factor="tumor_normal",
label_colors=c("darkseagreen","dodgerblue"))
Figure 3: Quality control plot
Boxplot displaying the distribution of (A) raw, (B) imputed (C) normalized and (D) imputed and normalized relative metabolite measurements for all samples of the study set. Boxplots are colored according to the grouping of interest, i.e. tumor vs. normal.
The quality control plots shows the distribution of raw and normalized metabolic measurements for every sample in the study set. As aimed, the distribution of measurements is comparable across the study set for the normalized and imputed data (figure 3D).
Part II of MetaboDiff comprises a set of functions for comparative data analysis that are purpose-built for the preprocessed MultiAssayExperiment object. All statistics obtained in this part will be saved in the metadata slot of the object. Certain plotting functions will only be available once the correponding statistics has been run, e.g. you will only be able to plot the volcano plot after executing the function for differential analysis (diff_test). Please note that the data analysis has been developed for metabolomic data sets of more than 100 metabolites. Hence, some functionality might not be available with smaller data sets < 50 metabolites.
To explore the metabolomic profiles between samples in an unsupervised fashion, tSNE analysis is performed as a non-linear dimension reduction technique.
tsne_plot(met,
group_factor="tumor_normal",
label_colors=c("darkseagreen","dodgerblue"))
Figure 4: tSNE plot
Samples are colored according to the group factor.
A plot of the first two dimensions (figure 4) does not reveal a distinct difference in the metabolomic profiles between the normal and tumor samples.
Differential analysis for individual metabolites is performed using Student’s T-Test. Correction for multiple testing is performed by independent hypothesis weighting (IHW7 Ignatiadis, N., Klaus, B., Zaugg, J. B., & Huber, W. (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13(7), 577–580. http://doi.org/10.1038/nmeth.3885) with variance as a covariate.
Hypothesis testing will be performed for the sample grouping “tumor_normal”, as well as for the randomly generated grouping “random_gender”.
met = diff_test(met,
group_factors = c("tumor_normal","random_gender"))
The results of the hypothesis testing is saved in the metadata slot of the MultiAssayExperiment object.
str(metadata(met), max.level=2)
## List of 2
## $ ttest_tumor_normal :'data.frame': 238 obs. of 4 variables:
## ..$ pval : num [1:238] 0.0206 0.7808 0.0832 0.0432 0.5859 ...
## ..$ adj_pval : num [1:238] 0.18 0.911 0.219 0.158 0.716 ...
## ..$ fold_change: num [1:238] 0.2872 0.0366 -0.3936 -0.5391 -0.1646 ...
## ..$ var : num [1:238] 0.264 0.287 0.872 1.21 1.516 ...
## $ ttest_random_gender:'data.frame': 238 obs. of 4 variables:
## ..$ pval : num [1:238] 0.2318 0.8626 0.4048 0.0121 0.2111 ...
## ..$ adj_pval : num [1:238] 0.83 0.959 0.862 0.386 0.83 ...
## ..$ fold_change: num [1:238] 0.1372 0.0208 -0.1742 -0.607 -0.3438 ...
## ..$ var : num [1:238] 0.264 0.287 0.872 1.21 1.516 ...
The columns of the result dataframe are the unadjusted p-value (pval), the adjusted p-value by independent hypothesis weighting (adj_pval), the Fold-Change between groups (fold_change) and the variance (var).
The comparative analysis can be visualized by means of a volcano plot (figure 5).
volcano_plot(met,
group_factor="tumor_normal",
label_colors=c("darkseagreen","dodgerblue"))
Figure 5: Volcano plot of comparison tumor vs normal
The dashed lines mark the significance thresholds adjusted p-value<0.05 and fold-change>1.5. Significant metabolites are colored.
As a sanity check, we also display the volcano plot for the random grouping “random_gender” for which we do not expect the same number of significant metabolites.
volcano_plot(met,
group_factor="random_gender",
label_colors=c("brown","orange"))
Figure 6: Volcano plot of random comparison
The dashed lines mark the significance thresholds adjusted p-value<0.05 and fold-change>1.5.
As expected, no metabolite was significantly different between the randomly assigned grouping male vs. female after multiple testing with a cutoff of p-value<0.05 (figure 6).
In this part of the analysis, we will exploit the Small Molecular Pathway Database (SMPDB) annotation to explore metabolic pathways and perform an enrichment analysis.
The annotations from the SMPDB have the prefix “SMPDB|”. The remaining annotation was provided by the commercial vendor that generated the example data (i.e. Metabolon®).
colnames(rowData(met[["norm_imputed"]]))
## [1] "BIOCHEMICAL" "SUPER_PATHWAY" "SUB_PATHWAY"
## [4] "METABOLON_ID" "PLATFORM" "KEGG_ID"
## [7] "HMDB_ID" "SMPDB|SMPDB.ID" "SMPDB|Pathway.Name"
## [10] "SMPDB|Pathway.Subject" "SMPDB|Metabolite.ID" "SMPDB|Metabolite.Name"
## [13] "SMPDB|HMDB.ID" "SMPDB|KEGG.ID" "SMPDB|ChEBI.ID"
## [16] "SMPDB|DrugBank.ID" "SMPDB|CAS" "SMPDB|Formula"
## [19] "SMPDB|IUPAC" "SMPDB|SMILES" "SMPDB|InChI"
## [22] "SMPDB|InChI.Key"
For instance, this annotation can be used to explore the variance in pathways across the data set (figure 7).
variance_boxplot(met,
rowAnnotation = "SMPDB|Pathway.Name")
Figure 7: Pathway-wide distribution of normalized relative metabolite measurements
The pathway annotation from the Small Molecular Pathway Database (SMPDB) was used.
MetaboDiff refrains from ‘classical’ pathway enrichment analyses, as they are limited by the need to define a priori metabolite sets for evaluation and ignore the topology of interactions. Hence, MetaboDiff offers the analysis of metabolic correlation networks.
This section describes the generation and analysis of metabolic correlation networks. The workflow was adapted from the weighted gene co-expression analysis (WGCNA) proposed by Langfelder and Horvarth8 Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559–559. http://doi.org/10.1186/1471-2105-9-559.
The first step in constructing a metabolic correlation network is the creation of a dissimilarity matrix. Biweight midcorrelation was used as a similiarity measure as it is more robust to outliers than the absolute correlation coefficient9 Zheng, C.-H., Yuan, L., Sha, W., & Sun, Z.-L. (2014). Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinformatics, 15 Suppl 15(Suppl 15), S3. http://doi.org/10.1186/1471-2105-15-S15-S3. This choice is important, as we do not expect metabolites to be correlated in all patients.
The core concept of the so called “weighted” correlation analysis by Langfelder and Horvarth is that instead of defining a “hard” threshold (e.g. an absolute correlation coefficient > 0.8) to decide whether a node to connected to another, the adjacency a is defined by raising the similarity s to a power beta (“soft” threshold):
Lastly, the dissimilarity measure w is defined by
For detailed rational of this approach, please see Zhang and Horvath10 Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17. http://doi.org/10.2202/1544-6115.1128. For metabolic networks, we identified that a beta value of 3 was the lowest power for which the scale-free topology of the topology was met.
The function diss_matrix creates the dissimilarity measure for the met objects and saves it in the metadata slot
met = diss_matrix(met)
str(metadata(met), max.level=1)
## List of 3
## $ ttest_tumor_normal :'data.frame': 238 obs. of 4 variables:
## $ ttest_random_gender:'data.frame': 238 obs. of 4 variables:
## $ diss_matrix : num [1:238, 1:238] 0 0.964 0.999 0.999 0.996 ...
## ..- attr(*, "dimnames")=List of 2
To identify metabolic correlation modules, metabolites are next clustered based on the dissimilarity measure where branches of the dendrogram correspond to modules. Ultimately, modules are detected by applying a branch cutting method with a minimal module size of 5 metabolites. We employed the dynamic branch cut method developed by Langfelder and colleagues11 Langfelder, P., Zhang, B., & Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics, 24(5), 719–720. http://doi.org/10.1093/bioinformatics/btm563., as constant height cutoffs exhibit suboptimal performance on complicated dendrograms. Figure 8 shows the hierarchical clustering and corresponding modules after branch cutting.
met = identify_modules(met,
min_module_size=5)
## ..cutHeight not given, setting it to 0.991 ===> 99% of the (truncated) height range in dendro.
## ..done.
WGCNA::plotDendroAndColors(metadata(met)$tree,
metadata(met)$module_color_vector,
'Module colors',
dendroLabels = FALSE,
hang = 0.03,
addGuide = TRUE,
guideHang = 0.05, main='')
Figure 8: Hierarchical clustering of metabolites
The different colors represent the modules identified by the dynamic branch cutting method.
The relation between the identified metabolic correlation modules can be visualized by a dendrogram of their eigengenes (figure 9). The module eigengene is defined as the first principal component of its expression matrix. It could be shown that the module eigengene is highly correlated with the gene that has the highest intramodular connectivity12 Horvath, S., & Dong, J. (2008). Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Computational Biology (PLOSCB) 4(8), 4(8), e1000117–e1000117. http://doi.org/10.1371/journal.pcbi.1000117.
par(mar=c(2,2,2,2))
ape::plot.phylo(ape::as.phylo(metadata(met)$METree),
type = 'fan',
show.tip.label = FALSE,
main='')
ape::tiplabels(frame = 'circle',
col='black',
text=rep('',length(unique(metadata(met)$modules))),
bg = WGCNA::labels2colors(0:21))
Figure 9: Hierarchy of metabolic correlation modules as revealed by the clustering of module eigengenes
Each node represents a metabolic correlation module.
# number of metabolites per module
table(metadata(met)$modules)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## 2 30 22 20 15 14 14 13 12 11 10 9 9 8 8 8 7 6 5 5 5 5
To enable a better interpretation of metabolic correlation modules, modules are named according to the most abundant pathway annotation in a module (figure 10).
# calculate module significance
met = name_modules(met,
pathway_annotation = "SUB_PATHWAY")
# plot phylogram with names
ape::plot.phylo(ape::as.phylo(metadata(met)$METree), cex=0.9)
Figure 10: Hierarchy of metabolic correlation modules
Modules are named according to the most abundant pathway annotation in a module. Module 0 comprises the two metabolites without a significant interaction.
An advantage of correlation network analysis is the possibility to integrate external information. At the lowest hierarchical level, gene significance (GS) measures can be defined as the statistical significance (i.e. p-value, \(p_i\)) between the \(i\)-th node profile (metabolite) \(x_i\) and the sample trait \(T\)
\[\begin{equation} GS_i = -log~p_i \end{equation}\]Module significance in turn can be determined as the average absolute gene significance measure. This conceptual framework can be adapted to any research question.
met = calculate_MS(met,
group_factors = c("tumor_normal","random_gender"))
str(metadata(met), max.level=1)
## List of 13
## $ ttest_tumor_normal :'data.frame': 238 obs. of 4 variables:
## $ ttest_random_gender:'data.frame': 238 obs. of 4 variables:
## $ diss_matrix : num [1:238, 1:238] 0 0.964 0.999 0.999 0.996 ...
## ..- attr(*, "dimnames")=List of 2
## $ tree :List of 7
## ..- attr(*, "class")= chr "hclust"
## $ modules : num [1:238] 3 5 13 1 11 1 11 11 13 1 ...
## $ module_color_vector: chr [1:238] "brown" "green" "salmon" "turquoise" ...
## $ MEs :'data.frame': 81 obs. of 22 variables:
## $ MEDiss : num [1:22, 1:22] 0 1.038 1.073 1.055 0.973 ...
## ..- attr(*, "dimnames")=List of 2
## $ METree :List of 7
## ..- attr(*, "class")= chr "hclust"
## $ module_colors : chr [1:22] "yellow" "lightgreen" "grey" "greenyellow" ...
## $ MM : num [1:238, 1:22] 0.32992 0.00722 0.09016 0.00808 0.12118 ...
## ..- attr(*, "dimnames")=List of 2
## $ MS_tumor_normal :'data.frame': 22 obs. of 2 variables:
## $ MS_random_gender :'data.frame': 22 obs. of 2 variables:
Figure 11 shows that metabolic correlation module 2 (Creatine metabolism / Glutathione metabolism) was significantly associated with the tumor vs. normal comparison in the example data.
MS_plot(met,
group_factor="tumor_normal")
Figure 11: Association of tumor vs normal trait with metabolic correlation modules
Module 2 (Creatine metabolism / Glutathione metabolism) showed a significant module significance.
In line with the volcano plots, the random grouping did not result in a significant association (figure 12)
MS_plot(met,
group_factor="random_gender")
Figure 12: Association of random trait with metabolic correlation modules
No significant module significance can be observed.
To explore the role of metabolites within a correlation module, Langfelder and Horvath suggest a ‘fuzzy’ measure of module membership defined as
\[\begin{equation} K^q = |cor(x_i,E^q)| \end{equation}\]where \(x_i\) is the profile of gene \(i\) and \(E^q\) is the eigengene of module \(q\). Based on this definition, \(K\) describes how closely related metabolite \(i\) is to module \(q\). A meaningful visualization is consequently plotting the module membership over the p-value of the respective GS measure. As a third dimension, the dot-size is weighted according to the effect size, i.e. absolute fold-change between the grouping of interest.
MOI_plot(met,
group_factor="tumor_normal",
MOI=2) + xlim(c(0,5))
Figure 13: Exploration of module of interest
Module membership is plotted over the -log10 p-value. The dashed line marks the significance threshold of adjusted p-value<0.05.
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] ape_4.1 WGCNA_1.60
## [3] fastcluster_1.1.22 dynamicTreeCut_1.63-1
## [5] IHW_1.4.0 factoextra_1.0.4
## [7] cluster_2.0.6 cowplot_0.7.0
## [9] genefilter_1.58.1 RColorBrewer_1.1-2
## [11] HiveR_0.2.55 vsn_3.44.0
## [13] impute_1.50.1 ggfortify_0.4.1
## [15] dplyr_0.7.2 purrr_0.2.2.2
## [17] readr_1.1.1 tidyr_0.6.3
## [19] tibble_1.3.3 tidyverse_1.1.1
## [21] gdata_2.18.0 BiocStyle_2.4.1
## [23] rmarkdown_1.6 MetaboDiff_0.1.0
## [25] devtools_1.13.3 ggplot2_2.2.1
## [27] ComplexHeatmap_1.14.0 MultiAssayExperiment_1.2.1
## [29] SummarizedExperiment_1.6.3 DelayedArray_0.2.4
## [31] matrixStats_0.52.2 Biobase_2.37.2
## [33] GenomicRanges_1.28.2 GenomeInfoDb_1.12.0
## [35] IRanges_2.10.2 S4Vectors_0.14.3
## [37] BiocGenerics_0.22.0
##
## loaded via a namespace (and not attached):
## [1] shinydashboard_0.6.1 robust_0.4-18 RSQLite_2.0
## [4] AnnotationDbi_1.38.1 htmlwidgets_0.8 combinat_0.0-8
## [7] trimcluster_0.1-2 munsell_0.4.3 animation_2.5
## [10] codetools_0.2-15 preprocessCore_1.38.1 withr_1.0.2
## [13] colorspace_1.3-2 BiocInstaller_1.27.2 highr_0.6
## [16] knitr_1.16 leaps_3.0 robustbase_0.92-7
## [19] labeling_0.3 slam_0.1-40 GenomeInfoDbData_0.99.0
## [22] lpsymphony_1.4.1 mnormt_1.5-5 bit64_0.9-7
## [25] rprojroot_1.2 coda_0.19-1 clusterGeneration_1.3.4
## [28] diptest_0.75-7 R6_2.2.2 doParallel_1.0.10
## [31] flexmix_2.3-14 bitops_1.0-6 assertthat_0.2.0
## [34] scales_0.4.1 nnet_7.3-12 gtable_0.2.0
## [37] affy_1.54.0 phangorn_2.2.0 rlang_0.1.1
## [40] scatterplot3d_0.3-40 GlobalOptions_0.0.12 splines_3.4.0
## [43] lazyeval_0.2.0 acepack_1.4.1 broom_0.4.2
## [46] checkmate_1.8.2 yaml_2.1.14 reshape2_1.4.2
## [49] modelr_0.1.0 backports_1.1.0 httpuv_1.3.3
## [52] Hmisc_4.0-3 tools_3.4.0 bookdown_0.4
## [55] psych_1.7.5 affyio_1.46.0 Rcpp_0.12.12
## [58] plyr_1.8.4 base64enc_0.1-3 zlibbioc_1.22.0
## [61] RCurl_1.95-4.8 rpart_4.1-11 GetoptLong_0.1.6
## [64] viridis_0.4.0 ggrepel_0.6.5 haven_1.0.0
## [67] magrittr_1.5 data.table_1.10.4 circlize_0.4.0
## [70] mvtnorm_1.0-6 whisker_0.3-2 hms_0.3
## [73] mime_0.5 evaluate_0.10 xtable_1.8-2
## [76] XML_3.98-1.7 jpeg_0.1-8 mclust_5.3
## [79] readxl_1.0.0 gridExtra_2.2.1 shape_1.4.2
## [82] compiler_3.4.0 maps_3.2.0 htmltools_0.3.6
## [85] pcaPP_1.9-72 Formula_1.2-1 rrcov_1.4-3
## [88] expm_0.999-2 lubridate_1.6.0 DBI_0.7
## [91] MASS_7.3-47 fpc_2.1-10 Matrix_1.2-10
## [94] quadprog_1.5-5 bindr_0.1 igraph_1.0.1
## [97] forcats_0.2.0 pkgconfig_2.0.1 fit.models_0.5-14
## [100] numDeriv_2016.8-1 foreign_0.8-68 xml2_1.1.1
## [103] foreach_1.4.3 annotate_1.54.0 XVector_0.16.0
## [106] rvest_0.3.2 stringr_1.2.0 digest_0.6.12
## [109] tsne_0.1-3 phytools_0.6-00 msm_1.6.4
## [112] cellranger_1.1.0 fastmatch_1.1-0 htmlTable_1.9
## [115] dendextend_1.5.2 kernlab_0.9-25 shiny_1.0.3
## [118] gtools_3.5.0 modeltools_0.2-21 rjson_0.2.15
## [121] nlme_3.1-131 jsonlite_1.5 bindrcpp_0.2
## [124] viridisLite_0.2.0 limma_3.32.2 lattice_0.20-35
## [127] httr_1.2.1 plotrix_3.6-5 DEoptimR_1.0-8
## [130] survival_2.41-3 GO.db_3.4.1 glue_1.1.1
## [133] fdrtool_1.2.15 UpSetR_1.3.3 png_0.1-7
## [136] prabclus_2.2-6 iterators_1.0.8 bit_1.1-12
## [139] class_7.3-14 stringi_1.1.5 blob_1.1.0
## [142] latticeExtra_0.6-28 memoise_1.1.0